1 ProteinLFQ

1.1 Packages

Load required packages into R session.

1.2 Functions

## SHA-1 hash of file is 46d72ef9fbf7843a06a8c125d64eaf3c02bfa2e5

1.3 Workflow

Load Protein Measurements spreadsheet exported from Progenesis.

Separte leading protein accession from group members.

  • Some protein accessions were grouped together by semicolons (;) in the data. These groups represent homologous proteins with shared peptide identifications. The first accession in each group is generally the most confident inference and is used for analysis, other accessions are considered group members.

Define column indeces for the normalized abundance values.

  • The Protein Measurements export from Progenesis contains the normalized and raw abundances for each raw file in the experiment.
##  [1] "Accession"              "Group members"         
##  [3] "Peptide count"          "Unique peptides"       
##  [5] "Confidence score"       "Anova (p)"             
##  [7] "Max fold change"        "Highest mean condition"
##  [9] "Lowest mean condition"  "Description"           
## [11] "20141222_WOS521"        "20141222_WOS526"       
## [13] "20141222_WOS5211"       "20141222_WOS5216"      
## [15] "20141222_WOS522"        "20141222_WOS527"       
## [17] "20141222_WOS5212"       "20141222_WOS5217"      
## [19] "20141222_WOS523"        "20141222_WOS528"       
## [21] "20141222_WOS5213"       "20141222_WOS5218"      
## [23] "20141222_WOS521_1"      "20141222_WOS526_1"     
## [25] "20141222_WOS5211_1"     "20141222_WOS5216_1"    
## [27] "20141222_WOS522_1"      "20141222_WOS527_1"     
## [29] "20141222_WOS5212_1"     "20141222_WOS5217_1"    
## [31] "20141222_WOS523_1"      "20141222_WOS528_1"     
## [33] "20141222_WOS5213_1"     "20141222_WOS5218_1"
##  [1] 11 12 13 14 15 16 17 18 19 20 21 22

Filter to remove proteins from the contaminant database.

Filter to remove proteins if not enough peptide evidence.

  • Proteins are identified by unique and shared PSMs, so usually the confidence in our protein identification is greatest with increasing unique peptides. It is common in proteomics to remove “one-hit-wonders” by requiring > 1 unique peptide per protein.

Select the identifier and abundance columns.

Example plot construction.

2 PeptideLFQ

Peptide-Level Quantification

  • During raw MS file processing, Progenesis subdivided each LC-MS run into peak features, which are MS1 precursor ions with a defined isotopic cluster with characteristic retention time and monoisotopic mass. The same peak feature coordinates are assigned for every LC-MS run and the summed intensity (abundance) from each is recorded.

  • MS2 spectra from data-dependent acquisition (DDA) contain the associated MS1 precursor ion mass and retention time, allowing them to be mapped to peak features.

  • The Peptide Measurements export from Progenesis contains the normalized abundances, raw abundances, and spectral counts for each identified peptide that was mapped to a MS1 peak feature in each raw file.

2.1 Packages

Load required packages into R session.

2.3 Workflow

2.3.2 Redox

Load Peptide Measurements spreadsheet exported from Progenesis.

Load Protein Measurements spreadsheet exported from Progenesis.

Load protein sequence database.

##   A AAStringSet instance of length 18944
##         width seq                                      names               
##     [1]   751 MTISTPEREAKKVKIAVDR...GGIATTWSFFLARIISVG sp|P12154|PSAA_CHLRE
##     [2]   735 MATKLFPKFSQGLAQDPTT...YIFTYAAFLIASTSGRFG sp|P09144|PSAB_CHLRE
##     [3]    81 MAHIVKIYDTCIGCTQCVR...SVRVYLGSESTRSMGLSY sp|Q00914|PSAC_CHLRE
##     [4]    43 MIFDFNYIHIFMLTITSYV...LVFTLGIYLGLLKVVKLI sp|P50369|PETL_CHLRE
##     [5]   160 MSVTKKPDLSDPVLKAKLA...LGIGSTFPIDISLTLGLF sp|P23230|PETD_CHLRE
##     ...   ... ...
## [18940]   238 MSKGEELFTGVVPILVELD...LEFVTAAGITHGMDELYK sp|GFP_AEQVI
## [18941]   204 MAEEVEEERLKYLDFVRAA...LPLLPTEKITKVFGDEAS sp|SRPP_HEVBR
## [18942]   138 MAEDEDNQQGQGEGLKYLG...SSLPGQTKILAKVFYGEN sp|REF_HEVBR
## [18943]   348 MFSSVMVALVSLAVAVSAN...VMNADNHEYFSENNPAQS sp|PLMP_GRIFR
## [18944]   271 MSHIQRETSCSRPRLNSNL...DNPDMNKLQFHLMLDEFF sp|KKA1_ECOLX

Define column indeces for the normalized abundance values.

##  [1] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Filter to keep peptides with Percolator-adjusted Mascot scores > 13 (p-value < 0.05).

Filter to remove peptides from proteins in the contaminant database.

Join the protein identification statistics onto the Peptide Measurement data.

  • Each peptide was assigned to a single protein accession, which will have identification statistics in the Protein Measurements export file.

Summarize duplicate peak features.

  • Some features were matched with peptides having identical sequence, modifications, and score, but alternate protein accessions. These groups were reduced to satisfy the principle of parsimony and represented by the protein accession with the highest number of unique peptides, else the protein with the largest confidence score assigned by Progenesis.

  • Some features were duplicated with differing peptide identifications and were reduced to a single peptide with the highest Mascot ion score.

Filter to remove peptides without an indirect site of Cys oxidation.

  • ASDFGH

Build an identifier term.

  • The protein accession is concatenated with the particular residue and position in the protein sequence of any modifications identified on the peptide.

Summarize duplicate identifiers.

  • The dataset was then reduced to unique identifiers by summing the abundance of all contributing features (i.e., peptide charge states, missed cleavages, and combinations of additional variable modifications).

  • Each identifier group was represented by the peptide with the highest Mascot score in the final dataset.

Simplify the data format.

  • Select the identifier and abundance columns to simplify in downstream processing.
## Warning: slice_() is deprecated. 
## Please use slice() instead
## 
## The 'programming' vignette or the tidyeval book can help you
## to program with slice() : https://tidyeval.tidyverse.org
## This warning is displayed once per session.
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

2.3.3 Variable Modification

2.3.4 Phosphorylation

3 StatLFQ

3.1 Packages

Load required packages into R session.

3.2 Functions

## SHA-1 hash of file is 21bea41dbe4f674300b2d84ef64282ad79962cf0

3.3 Workflow

Define input data.

Define the column indeces for replicates in each condition.

  • Assuming input data has simplified format of an identifier column followed by abundance columns for each raw file in the experiment.
## [1] 2 3 4 5
## [1] 6 7 8 9
## [1] 10 11 12 13
## $`25`
## [1] 2 3 4 5
## 
## $`50`
## [1] 6 7 8 9
## 
## $`100`
## [1] 10 11 12 13
## $`25-50`
## $`25-50`[[1]]
## [1] 2 3 4 5
## 
## $`25-50`[[2]]
## [1] 6 7 8 9
## 
## 
## $`25-100`
## $`25-100`[[1]]
## [1] 2 3 4 5
## 
## $`25-100`[[2]]
## [1] 10 11 12 13
## 
## 
## $`50-100`
## $`50-100`[[1]]
## [1] 6 7 8 9
## 
## $`50-100`[[2]]
## [1] 10 11 12 13

Rename the abundance columns in a simplified “Condition-Replicate” format.

##  [1] "Accession" "25-1"      "25-2"      "25-3"      "25-4"     
##  [6] "50-1"      "50-2"      "50-3"      "50-4"      "100-1"    
## [11] "100-2"     "100-3"     "100-4"

Filter to keep identifiers with >50% of replicates having nonzero abundances in any condition.

Perform a log2-transformation of abundances as a variance-stabilization. The base R function log2() returns “-Inf” for zero values. These instances are changed to “NA” following the transformation.

Impute missing values (“NA”) using a conditional strategy with the imp4p package. Iterate by condition and check if each idenfifier has reliable quantitation.

  • If at least one replicate has a nonzero abundance, impute other replicates with values drawn from a normal distribution centered on the mean of nonzero replicates.

  • If all replicates have nonzero abundance, impute with small values drawn from a normal distribution centered on the lower 25th percentile of abundances.

3.3.1 Pairwise t-test

Perform a t-test on each identifier for defined condition pairs.

  • By default, a two-sided, equal-variance t-test is run with Benjamini-Hochberg FDR correction.

3.3.2 One-way ANOVA

Perform a one-way analysis of variance (ANOVA) on each identifier across conditions.

  • By default, a one-way ANOVA is run for all unique conditions with Benjamini-Hochberg FDR correction.

3.3.3 Fold change

Calculate fold change using the mean replicate abundance for each condition.

  • Subtracting log2-transformed values is equivalent to dividing the non-transformed values and then transforming: \(log~2~(B) - log~2~(A) = log~2~(B/A)\)

3.3.4 Clustering

Unsupervised

4 PlotLFQ

4.1 Packages

Load required packages into R session.

4.2 Functions

## SHA-1 hash of file is 3210231e1ee84ce77a0cbb73f2ed57b20bfa6d93

4.4 PCA

Principal component analysis (PCA)

4.6 Trend Profiles

Visualization of results from heirarchical clustering.

5 AnnotateLFQ

5.1 Packages

Load required packages into R session.

5.2 Functions

## SHA-1 hash of file is 34d87fddd0d3e4a15a837ac42bbc820b3df5a648

5.3 UniProtKB API

Given a list of protein accessions, access UniProtKB and pull known information.

6 Output

6.1 Dataframes

The write_csv() function from the readr package in tidyverse is one option to save a dataframe copy in the working directory.

7 Session

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] broom_0.5.2         imp4p_0.7           norm_1.0-9.5       
##  [4] truncnorm_1.0-8     Iso_0.0-18          Biostrings_2.50.2  
##  [7] XVector_0.22.0      IRanges_2.16.0      S4Vectors_0.20.1   
## [10] BiocGenerics_0.28.0 forcats_0.4.0       stringr_1.4.0      
## [13] dplyr_0.8.3         purrr_0.3.2         readr_1.3.1        
## [16] tidyr_0.8.3         tibble_2.1.3        ggplot2_3.2.0      
## [19] tidyverse_1.2.1     devtools_2.1.0      usethis_1.5.1      
## [22] knitr_1.23         
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.0        pkgload_1.0.2     jsonlite_1.6     
##  [4] modelr_0.1.4      assertthat_0.2.1  cellranger_1.1.0 
##  [7] yaml_2.2.0        remotes_2.1.0     sessioninfo_1.1.1
## [10] pillar_1.4.2      backports_1.1.4   lattice_0.20-35  
## [13] glue_1.3.1        digest_0.6.20     rvest_0.3.4      
## [16] colorspace_1.4-1  plyr_1.8.4        htmltools_0.3.6  
## [19] pkgconfig_2.0.2   haven_2.1.1       zlibbioc_1.28.0  
## [22] scales_1.0.0      processx_3.4.0    generics_0.0.2   
## [25] withr_2.1.2       lazyeval_0.2.2    cli_1.1.0        
## [28] magrittr_1.5      crayon_1.3.4      readxl_1.3.1     
## [31] memoise_1.1.0     evaluate_0.14     ps_1.3.0         
## [34] fs_1.3.1          nlme_3.1-137      xml2_1.2.0       
## [37] pkgbuild_1.0.3    tools_3.5.1       prettyunits_1.0.2
## [40] hms_0.5.0         munsell_0.5.0     callr_3.3.0      
## [43] compiler_3.5.1    rlang_0.4.0       grid_3.5.1       
## [46] rstudioapi_0.10   labeling_0.3      rmarkdown_1.13   
## [49] testthat_2.1.1    gtable_0.3.0      curl_3.3         
## [52] R6_2.4.0          lubridate_1.7.4   zeallot_0.1.0    
## [55] rprojroot_1.3-2   desc_1.2.0        stringi_1.4.3    
## [58] Rcpp_1.0.1        vctrs_0.2.0       tidyselect_0.2.5 
## [61] xfun_0.8